200 research outputs found

    Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations

    Full text link
    Load imbalance pervasively exists in distributed deep learning training systems, either caused by the inherent imbalance in learned tasks or by the system itself. Traditional synchronous Stochastic Gradient Descent (SGD) achieves good accuracy for a wide variety of tasks, but relies on global synchronization to accumulate the gradients at every training step. In this paper, we propose eager-SGD, which relaxes the global synchronization for decentralized accumulation. To implement eager-SGD, we propose to use two partial collectives: solo and majority. With solo allreduce, the faster processes contribute their gradients eagerly without waiting for the slower processes, whereas with majority allreduce, at least half of the participants must contribute gradients before continuing, all without using a central parameter server. We theoretically prove the convergence of the algorithms and describe the partial collectives in detail. Experimental results on load-imbalanced environments (CIFAR-10, ImageNet, and UCF101 datasets) show that eager-SGD achieves 1.27x speedup over the state-of-the-art synchronous SGD, without losing accuracy.Comment: Published in Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'20), pp. 45-61. 202

    SparCML: High-Performance Sparse Communication for Machine Learning

    Full text link
    Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed "data parallel" distributed across many nodes. Each node's contribution to the overall gradient is summed using a global allreduce. This allreduce is the single communication and thus scalability bottleneck for most machine learning workloads. We observe that frequently, many gradient values are (close to) zero, leading to sparse of sparsifyable communications. To exploit this insight, we analyze, design, and implement a set of communication-efficient protocols for sparse input data, in conjunction with efficient machine learning algorithms which can leverage these primitives. Our communication protocols generalize standard collective operations, by allowing processes to contribute arbitrary sparse input data vectors. Our generic communication library, SparCML, extends MPI to support additional features, such as non-blocking (asynchronous) operations and low-precision data representations. As such, SparCML and its techniques will form the basis of future highly-scalable machine learning frameworks

    Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

    Full text link
    Dense Multi-GPU systems have recently gained a lot of attention in the HPC arena. Traditionally, MPI runtimes have been primarily designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and CUDA-Aware MPI runtimes like MVAPICH2 and OpenMPI, it has become important to address efficient communication schemes for such dense Multi-GPU nodes. This coupled with new application workloads brought forward by Deep Learning frameworks like Caffe and Microsoft CNTK pose additional design constraints due to very large message communication of GPU buffers during the training phase. In this context, special-purpose libraries like NVIDIA NCCL have been proposed for GPU-based collective communication on dense GPU systems. In this paper, we propose a pipelined chain (ring) design for the MPI_Bcast collective operation along with an enhanced collective tuning framework in MVAPICH2-GDR that enables efficient intra-/inter-node multi-GPU communication. We present an in-depth performance landscape for the proposed MPI_Bcast schemes along with a comparative analysis of NVIDIA NCCL Broadcast and NCCL-based MPI_Bcast. The proposed designs for MVAPICH2-GDR enable up to 14X and 16.6X improvement, compared to NCCL-based solutions, for intra- and inter-node broadcast latency, respectively. In addition, the proposed designs provide up to 7% improvement over NCCL-based solutions for data parallel training of the VGG network on 128 GPUs using Microsoft CNTK.Comment: 8 pages, 3 figure

    Kilometer-scale climate models: Prospects and challenges

    Get PDF
    Currently major efforts are underway toward refining the horizontal resolution (or grid spacing) of climate models to about 1 km, using both global and regional climate models (GCMs and RCMs). Several groups have succeeded in conducting kilometer-scale multiweek GCM simulations and decadelong continental-scale RCM simulations. There is the well-founded hope that this increase in resolution represents a quantum jump in climate modeling, as it enables replacing the parameterization of moist convection by an explicit treatment. It is expected that this will improve the simulation of the water cycle and extreme events and reduce uncertainties in climate change projections. While kilometer-scale resolution is commonly employed in limited-area numerical weather prediction, enabling it on global scales for extended climate simulations requires a concerted effort. In this paper, we exploit an RCM that runs entirely on graphics processing units (GPUs) and show examples that highlight the prospects of this approach. A particular challenge addressed in this paper relates to the growth in output volumes. It is argued that the data avalanche of high-resolution simulations will make it impractical or impossible to store the data. Rather, repeating the simulation and conducting online analysis will become more efficient. A prototype of this methodology is presented. It makes use of a bit-reproducible model version that ensures reproducible simulations across hardware architectures, in conjunction with a data virtualization layer as a common interface for output analyses. An assessment of the potential of these novel approaches will be provided

    Increased neutrophil-lymphocyte ratio is a poor prognostic factor in patients with primary operable and inoperable pancreatic cancer

    Get PDF
    Background: The neutrophil-lymphocyte ratio (NLR) has been proposed as an indicator of systemic inflammatory response. Previous findings from small-scale studies revealed conflicting results about its independent prognostic significance with regard to different clinical end points in pancreatic cancer (PC) patients. Therefore, the aim of our study was the external validation of the prognostic significance of NLR in a large cohort of PC patients. Methods: Data from 371 consecutive PC patients, treated between 2004 and 2010 at a single centre, were evaluated retrospectively. The whole cohort was stratified into two groups according to the treatment modality. Group 1 comprised 261 patients with inoperable PC at diagnosis and group 2 comprised 110 patients with surgically resected PC. Cancer-specific survival (CSS) was assessed using the Kaplan–Meier method. To evaluate the independent prognostic significance of the NLR, the modified Glasgow prognostic score (mGPS) and the platelet-lymphocyte ratio univariate and multivariate Cox regression models were applied. Results: Multivariate analysis identified increased NLR as an independent prognostic factor for inoperable PC patients (hazard ratio (HR)=2.53, confidence interval (CI)=1.64–3.91, P<0.001) and surgically resected PC patients (HR=1.61, CI=1.02–2.53, P=0.039). In inoperable PC patients, the mGPS was associated with poor CSS only in univariate analysis (HR=1.44, CI=1.04–1.98). Conclusion: Risk prediction for cancer-related end points using NLR does add independent prognostic information to other well-established prognostic factors in patients with PC, regardless of the undergoing therapeutic modality. Thus, the NLR should be considered for future individual risk assessment in patients with PC

    Report from the OECI Oncology Days 2014

    Get PDF
    The 2014 OECI Oncology Days was held at the ‘Prof. Dr. Ion Chiricuta’ Oncology Institute in Cluj, Romania, from 12 to 13 June. The focus of this year’s gathering was on developments in personalised medicine and other treatment advances which have made the cost of cancer care too high for many regions throughout Europe

    Outcomes in randomised controlled trials in prevention and management of carious lesions:a systematic review

    Get PDF
    Abstract Background Inconsistent outcome reporting is one significant hurdle to combining results from trials into systematic reviews. Core outcome sets (COS) can reduce this barrier. The aim of this review was to map outcomes reported in caries prevention and management randomised controlled trials (RCT) as a first step to COS development. We also investigated RCT characteristics and reporting of primary outcomes and sample size calculations. Methods PubMed, Embase, Web of Knowledge and Cochrane CENTRAL were systematically searched (1 January 1968 to 25 August 2015). Inclusion criteria: RCTs comparing any technique for prevention or management of caries with another or placebo and RCTs comparing interventions to support patients undergoing treatment of caries (without setting, dentition or age restrictions). Categories were developed through piloting and group consensus and outcomes grouped accordingly. Results Of 4773 search results, 764 were potentially relevant, full text was available for 731 papers and 605 publications met the inclusion criteria and were included. For all outcomes across the time periods 1968–1980 and 2001–2010, reporting of outcome ‘caries experience’ reduced from 39% to 18%; ‘clinical performance of the restoration’ reporting increased from 33% to 42% although there was a reduction to 22% in 2011–2015. Emerging outcome domains include ‘lesion activity’ and ‘pulp health-related outcomes’, accounting for 1% and 0%, respectively, during 1968–1980 and 10% and 4% for 2011–2015. Reporting ‘resource efficiency’ and ‘quality of life measures’ have remained at a low level. No publications reported tooth survival independent of an index such as DMFT or equivalent. Primary outcomes were only identified as such in 414 (68%) of the reports. Conclusions Over the past 50 years, outcome reporting for trials on prevention and management of carious lesions have tended to focus on outcomes measuring caries experience and restoration material clinical performance with lesion activity and cost-effectiveness increasingly being reported. Patient-reported and patient-focused outcomes are becoming more common (although as secondary outcomes) but remain low in use. The challenge with developing a COS will be balancing commonly previously reported outcomes against those more relevant for the future. Trial registration PROSPERO, CRD42015025310 . Registered on 14 August 2015, Trials (Schwendicke et al., Trials 16:397, 2015) and COMET initiative online (COMET, 2017)

    Cancer Induces Cardiomyocyte Remodeling and Hypoinnervation in the Left Ventricle of the Mouse Heart

    Get PDF
    Cancer is often associated with cachexia, cardiovascular symptoms and autonomic dysregulation. We tested whether extracardiac cancer directly affects the innervation of left ventricular myocardium. Mice injected with Lewis lung carcinoma cells (tumor group, TG) or PBS (control group, CG) were analyzed after 21 days. Cardiac function (echocardiography), serum levels of TNF-α and Il-6 (ELISA), structural alterations of cardiomyocytes and their innervation (design-based stereology) and levels of innervation-related mRNA (quantitative RT-PCR) were analysed. The groups did not differ in various functional parameters. Serum levels of TNF-α and Il-6 were elevated in TG. The total length of axons in the left ventricle was reduced. The number of dense core vesicles per axon profile was reduced. Decreased myofibrillar volume, increased sarcoplasmic volume and increased volume of lipid droplets were indicative of metabolic alterations of TG cardiomyocytes. In the heart, the mRNA level of nerve growth factor was reduced whereas that of β1-adrenergic receptor was unchanged in TG. In the stellate ganglion of TG, mRNA levels of nerve growth factor and neuropeptide Y were decreased and that of tyrosine hydroxylase was increased. In summary, cancer induces a systemic pro-inflammatory state, a significant reduction in myocardial innervation and a catabolic phenotype of cardiomyocytes in the mouse. Reduced expression of nerve growth factor may account for the reduced myocardial innervation
    • …
    corecore